Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet
نویسندگان
چکیده
Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and English words, but also measures Chinese-English cross-lingual word semantic similarity. It utilizes WordNet’s hypernym / hyponym relationships between synsets and evaluates the similarity by measuring the distances between synsets, the local densities of synsets and the depths of the synsets on the entire hierarchy of WordNet. Most words have more than one meaning. Therefore, the algorithm sets up the weights of the combination pairs of the two words’ synsets in an adaptive mode. Experimental results show that the similarities measured by our algorithm match with human common sense in general.
منابع مشابه
Sense Extraction and Disambiguation for Chinese Words from Bilingual Terminology Bank
Using lexical semantic knowledge to solve natural language processing problems has been getting popular in recent years. Because semantic processing relies heavily on lexical semantic knowledge, the construction of lexical semantic databases has become urgent. WordNet is the most famous English semantic knowledge database at present; many researches of word sense disambiguation adopt it as a st...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملBuilding A Chinese WordNet Via Class-Based Translation Model
Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine readable dictionaries (MRDs), machine readable thesauri, and bilingual corpora. In recent years, WordNet has become the most widely used resource for the study of...
متن کاملSinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO
The Academia Sinica Bilingual Ontological Wordnet (Sinica BOW) integrates three resources: WordNet, English-Chinese Translation Equivalents Database (ECTED), and SUMO (Suggested Upper Merged Ontology). The three resources were originally linked in two pairs: WordNet 1.6 was manually mapped to SUMO (Niles & Pease 2003) and also to ECTED (the English lemmas in WordNet were mapped to their Chinese...
متن کاملBilingual Word Embeddings for Phrase-Based Machine Translation
We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic simil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JSW
دوره 10 شماره
صفحات -
تاریخ انتشار 2015